Numbat Walkthrough
library(ggplot2)
library(numbat)
library(dplyr)
library(conos)
library(logger)
library(stringr)
library(magrittr)
library(glue)
library(data.table)
library(patchwork)Interperting Numbat outputs
In this tutorial, we will illustrate how to use Numbat to reconstruct the single-cell copy number profile and tumor phylogeny from scRNA data, using an aplastic thyroid cancer dataset (ATC2) from Gao et al. Numbat produces a number of files in the output folder. Let’s load the results using this utility function:
# loading numbat results
outdir = '~/results/test'
res = fetch_results(outdir, i = 2)
# loading conos object for visualization
con_ATC = readRDS('~/external/MDA/conos_ATC.rds')
con_sample = con_ATC$samples[['ATC2']]Numbat first approximates a single-cell phylogeny using window-smoothed average expression values. We can visualize the averaged expression profile and the initial phylogeny as follows:
gexp.norm.long = fread(glue('{outdir}/gexp.norm.long.tsv.gz'))
hc = readRDS(glue('{outdir}/hc.rds'))
plot_sc_roll(
gexp.norm.long,
hc,
k = 3,
n_sample = 300
) This view is similar to inferCNV and CopyKat. It appears that we have three main clusters: clusters 1 and 2 seem to harbor amplifications on chromosome 5, 7, 19. Cluster 3 does not have obvious CNVs and is presumably the normal cells. However, it is not clear what the CNV events are (their boundaries and copy number states) and how the two tumor populations are related to each other (are they distinct lineages or is one the ancestor of the other?). Let’s use Numbat to get a more detailed view of the copy number landscape in this dataset.
HMM
Numbat iteratively detects CNVs using a haplotype-enhanced HMM and reconstructs the tumor phylogeny using maximum likelihood using the detected CNV events. Let’s first see what events are detected from the HMM in the last iteration:
res$bulk_clones %>%
plot_bulks(min_depth = 5) Interestingly, we observe widespread loss of heterozygosity in the majority of the chromosomes, which we could not see from the expression view. This also tells us that the likely diploid regions are on chromosome 5,7 and 19.
CNV posteriors
We can visualize the event-specific posteriors, in single cells:
plist = list()
# muts = c('8_1', '5_2', '19_1')
muts = unique(res$joint_post$seg_label) %>% gtools::mixedsort()
for (mut in muts) {
plist[[mut]] = con_sample$plotEmbedding(
alpha=0.8,
size=1,
plot.na = F,
colors = res$joint_post %>%
left_join(res$clone_post %>% select(clone_opt, cell), by = 'cell') %>%
filter(seg_label == mut) %>%
{setNames(.$p_cnv, .$cell)},
show.legend = T,
mark.groups = F,
plot.theme = theme_bw(),
title = mut
) +
scale_color_gradient2(low = 'royalblue', mid = 'white', high = 'red3', midpoint = 0.5, limits = c(0,1))
}
wrap_plots(plist, guides = 'collect')It seems pretty clear that the the CNVs are restricted to a distinct cluster composed of tumor cells. Combinding evidence from all CNVs, Numbat derives an aneuploidy probablity for each cell to distinguish tumor versus normal cells. We can visualize the posterior aneuploidy probablity based on expression evidence only, allele evidence only, and jointly:
p_joint = con_sample$plotEmbedding(
alpha=0.8,
size=1,
colors = res$clone_post %>%
{setNames(.$p_cnv, .$cell)},
plot.na = F,
plot.theme = theme_bw(),
title = 'Joint',
) +
scale_color_gradient2(low = 'royalblue', mid = 'white', high = 'red3', midpoint = 0.5)
p_allele = con_sample$plotEmbedding(
alpha=0.8,
size=1,
colors = res$clone_post %>%
{setNames(.$p_cnv_x, .$cell)},
plot.na = F,
plot.theme = theme_bw(),
title = 'Expression',
) +
scale_color_gradient2(low = 'royalblue', mid = 'white', high = 'red3', midpoint = 0.5)
p_expr = con_sample$plotEmbedding(
alpha=0.8,
size=1,
colors = res$clone_post %>%
{setNames(.$p_cnv_y, .$cell)},
plot.na = F,
show.legend = T,
plot.theme = theme_bw(),
title = 'Allele',
) +
scale_color_gradient2(low = 'royalblue', mid = 'white', high = 'red3', midpoint = 0.5)
(p_expr | p_allele | p_joint) + plot_layout(guides = 'collect') Both expression and allele signal clearly seperates the tumor and normal cells.
Tumor subclones
Restricting our attention to the tumor cells, we see that although most events are shared by all tumor cells, some are only present in a subset of the cells:
plist = list()
muts = unique(res$joint_post$seg_label) %>% gtools::mixedsort()
for (mut in muts) {
plist[[mut]] = con_sample$plotEmbedding(
alpha=0.8,
size=1,
plot.na = F,
colors = res$joint_post %>%
left_join(res$clone_post %>% select(clone_opt, cell), by = 'cell') %>%
filter(clone_opt != 1) %>%
filter(seg_label == mut) %>%
{setNames(.$p_cnv, .$cell)},
show.legend = T,
mark.groups = F,
plot.theme = theme_bw(),
title = mut
) +
scale_color_gradient2(low = 'royalblue', mid = 'white', high = 'red3', midpoint = 0.5, limits = c(0,1)) +
guides(color = guide_legend(title = 'posterior'))
}
wrap_plots(plist, guides = 'collect')Interestingly, the two deletions on chr19 are mutually exclusive. Indeed, there appear to be multiple subclones in the tumor. We can visualize the distinct tumor subclones called by Numbat - note that clone 1 is always the normal cells.
pal = RColorBrewer::brewer.pal(n = 4, 'Set1')
p1 = con_sample$plotEmbedding(
alpha=0.8,
size=1,
groups = res$clone_post %>%
{setNames(.$clone_opt, .$cell)},
show.legend = T,
mark.groups = F,
plot.na = F,
plot.theme = theme_bw(),
title = 'Genotypes',
pal = c('gray', pal)
)
p2 = con_sample$plotEmbedding(
alpha=0.8,
size=1,
groups = res$clone_post %>%
filter(clone_opt != 1) %>%
{setNames(.$clone_opt, .$cell)},
show.legend = F,
mark.groups = F,
plot.na = F,
plot.theme = theme_bw(),
title = 'Genotypes',
pal = pal
)
p1 | p2 The cells from the tumor subclones seemed to largely separate in expression space.
Single-cell phylogeny
The subclones and their evolutionary relationships are discerned by Numbat phylogeny inference. Now we can visualize the detected CNVs and the single-cell phylogeny reconstructed by Numbat in an integrated view:
plot_sc_joint(
res$gtree,
res$joint_post,
res$segs_consensus,
tip_length = 2,
branch_width = 0.2,
size = 0.3,
clone_bar = T
) +
ggtitle('ATC2') The detailed mutational history constructed by Numbat can be visualized as well. Here the nodes represent clones (each carrying a distinct genotype) and edges represent mutational events.
plot_mut_history(res$G_m)